Extreme Value Analysis Without the Largest Values: What Can Be Done?

Richard Davis, Department of Statistics, Columbia University, USA

In this paper we are concerned with the analysis of heavy-tailed data when a portion of the extreme values is unavailable. This research was motivated by an analysis of the degree distributions in a large social network. The degree distributions of such networks tend to have power law behavior in the tails. We focus on the Hill estimator, which plays a starring role in heavy-tailed modeling. The Hill estimator for this data exhibited a smooth and increasing “sample path” as a function of the number of upper order statistics used in constructing the estimator. This behavior became more apparent as we artificially removed more of the upper order statistics. Building on this observation we introduce a new version of the Hill estimator. It is a function of the proportion theta of the upper order statistics used in the estimation, but also depends on the proportion delta of unavailable extremes values. We establish functional convergence of the normalized Hill estimator to a Gaussian process. An estimation procedure is developed based on the limit theory to estimate the number of missing extremes and extreme value parameters including the tail index and the bias of Hill’s estimate. We illustrate how this approach works in both simulations and real data examples.
This is joint work with Jingjing Zou and Gennady Samorodnitsky.